WIP: Clean version of semi-supervised PR#14
Conversation
…sfer-learning-wsj-rm Conflicts: egs/wsj/s5/steps/nnet3/xconfig_to_configs.py
…vised Travis was failing to compile(not sure why)-- I used the "Update Branch" button
|
@hhadian Could you please review this PR? |
|
Sure, I will do it. |
| # Copyright 2017 Vimal Manohar | ||
| # Apache 2.0 | ||
|
|
||
| # This is fisher chain recipe for training a model on a subset of around 100 hours. |
There was a problem hiding this comment.
Does this script use 100hrs supervised training data? add better description e.g. this script uses 100hrs supervised data
| exp=exp/semisup_100k | ||
| gmm=tri4a | ||
| xent_regularize=0.1 | ||
| hidden_dim=725 |
There was a problem hiding this comment.
Isn't it large? we use 625 for 300hrs swbd data.
| num_epochs=4 | ||
| remove_egs=false | ||
| common_egs_dir= | ||
| minibatch_size=128 |
| set -e | ||
|
|
||
| # This is an oracle experiment using oracle transcription of 250 hours of | ||
| # unsupervised data, along with 100 hours of supervised data. |
There was a problem hiding this comment.
I think you can easily use run_tdnn_100k_a.sh with new combined dataset, I am not sure why do you need two separate scripts?
There was a problem hiding this comment.
I agree. In the inital PR, there was only one TDNN recipe which was
called with different training data sets during semi-supervised training.
Separating the scripts might be clearer but it will add too many very similar scripts.
| exp=exp/semisup_15k | ||
| gmm=tri3 | ||
| xent_regularize=0.1 | ||
| hidden_dim=500 |
There was a problem hiding this comment.
Did you try to reduce it to smaller size or reducing number of layers?
There was a problem hiding this comment.
I guess I tried smaller sizes but it did not help much.
There was a problem hiding this comment.
I think the Fisher dev and test are more similar to training compared to eval2000 and swbd. So it might be ok for a larger network.
|
|
||
| # Semi-supervised options | ||
| comb_affix=comb1am # affix for new chain-model directory trained on the combined supervised+unsupervised subsets | ||
| supervision_weights=1.0,1.0 |
| # Neural network opts | ||
| apply_deriv_weights=true | ||
| xent_regularize=0.1 | ||
| hidden_dim=725 |
There was a problem hiding this comment.
Did you try to tune it? my guess is that it is large hidden dim.
| apply_deriv_weights=true | ||
| xent_regularize=0.1 | ||
| hidden_dim=725 | ||
| minibatch_size="150=128/300=64" |
There was a problem hiding this comment.
Is it better than using minibatch_size=150,300?
| relu-batchnorm-layer name=prefinal-xent input=tdnn6 dim=$hidden_dim target-rms=0.5 | ||
| output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5 | ||
|
|
||
| output name=output-0 input=output.affine skip-in-init=true |
There was a problem hiding this comment.
Why do you have two separate output nodes? Do you use weighted training? supervision weights were the same in the script.
There was a problem hiding this comment.
what is skip-in-init option? You can add a single line comment about output in config file.
There was a problem hiding this comment.
skip-in-init is added to prevent output line (trivial output layer) from being printed in init.config.
Do you know why the trivial output layers are needed in init.config?
There was a problem hiding this comment.
I think we just use init.config for training lda matrix and we don't need to add other outputs. Probably, we can modify the xconfig to not print trivial outputs in init.config (we just need to print output-node name=output).
| --lattice-prune-beam "$lattice_prune_beam" \ | ||
| --phone-insertion-penalty "$phone_insertion_penalty" \ | ||
| --deriv-weights-scp $chaindir/best_path_${unsupervised_set}${decode_affix}/weights.scp \ | ||
| --online-ivector-dir $exp/nnet3${nnet3_affix}/ivectors_${semisup_train_set}_sp_hires \ |
There was a problem hiding this comment.
Do you use supervised data for ivector training?
hhadian
left a comment
There was a problem hiding this comment.
I was under the impression that this PR could be wrapped up in 20-30 changed files (new or modified)
Not sure but I feel 70 changed files is too many.
|
|
||
| train_lm.sh --arpa --lmtype 3gram-mincount $dir || exit 1; | ||
|
|
||
| train_lm.sh --arpa --lmtype 4gram-mincount $dir || exit 1; |
There was a problem hiding this comment.
If you are using pocolm, it might be better to leave this script unchanged (to make the PR smaller)
| set -e | ||
|
|
||
| # This is an oracle experiment using oracle transcription of 250 hours of | ||
| # unsupervised data, along with 100 hours of supervised data. |
There was a problem hiding this comment.
I agree. In the inital PR, there was only one TDNN recipe which was
called with different training data sets during semi-supervised training.
Separating the scripts might be clearer but it will add too many very similar scripts.
| exp=exp/semisup_15k | ||
| gmm=tri3 | ||
| xent_regularize=0.1 | ||
| hidden_dim=500 |
There was a problem hiding this comment.
I guess I tried smaller sizes but it did not help much.
| @@ -0,0 +1,201 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
I guess it would be nicer to keep only one version of everything (even though it's in tuning) so no _i, _a, etc. That's because there are too many files in this PR.
|
|
||
| std::string wav_rspecifier = po.GetArg(1); | ||
| std::string wav_wspecifier = po.GetArg(2); | ||
| if (ClassifyRspecifier(po.GetArg(1), NULL, NULL) != kNoRspecifier) { |
There was a problem hiding this comment.
This change was for perturb_to_allowed_lengths.py. I guess you are not using
that script (i.e. non-split training) so it might be better to leave this file unchanged.
| if (token == "<DW>") | ||
| ReadVectorAsChar(is, binary, &deriv_weights); | ||
| else | ||
| deriv_weights.Read(is, binary); |
There was a problem hiding this comment.
DW reads only 0 and 1. DW2 reads and writes as float, which is needed for this.
|
BTW, I noticed your |
No description provided.